Command Line Interface Utilities
Introduction
主要用来管理作业状态
Oozie Command Line Usage
环境变量:
OOZIE_URL
: 默认'-oozie
' 选项值
OOZIE_TIMEZONE
: 默认'-timezone
' 选项值'
OOZIE_AUTH
: 默认'-auth
' 选项值'
'自定义头信息:使用'-Dheader:NAME=VALUE
'
oozie version
: 显示客户端版本
作业操作:
oozie job <OPTIONS> :
-action <arg> coordinator rerun/kill on action ids (requires -rerun/-kill); coordinator log retrieval on action ids (requires -log)
-allruns Get workflow jobs corresponding to a coordinator action
including all the reruns
-auth <arg> select authentication type [SIMPLE|KERBEROS]
-change <arg> change a coordinator/bundle job
-config <arg> job configuration file '.xml' or '.properties'
-D <property=value> set/override value for given property
-date <arg> coordinator/bundle rerun on action dates (requires -rerun)
-definition <arg> job definition
-doas <arg> doAs user, impersonates as the specified user
-dryrun Dryrun a workflow (since 3.3.2) or coordinator (since 2.0) job without actually executing it
-info <arg> info of a job
-kill <arg> kill a job (coordinator requires -action or -date)
-len <arg> number of actions to be returned, used for pagination(default 1000, requires -info)
-filter <arg> All coordinator actions satisfying the filter will be retrieved.
Filter is of the format <key><comparator><value>[;<key><comparator><value>]*
key: status or nominaltime
comparator: =, !=, <, <=, >, >=
value: valid status like SUCCEEDED, KILLED, RUNNING etc. Only = and != apply for status
nominalTime is valid date of the format yyyy-MM-dd'T'HH:mm'Z' (like 2014-06-01T00:00Z)
Filter with '=' is concatenated with 'OR' and other filters are concatenated with 'AND'.
Currently, supported only for coordinator job.
-order <arg> order to show coordinator actions (default ascending order, 'desc' for descending order, requires -info)
Currently, only supported for coordinator job.
-localtime use local time (same as passing your time zone to -timezone).
Overrides -timezone option
-log <arg> job log
-nocleanup do not clean up output-events of the coordinator rerun actions
(requires -rerun)
-offset <arg> offset of actions returned relative to all actions matching the filter criteria,
used for pagination (default '1', requires -info)
-oozie <arg> Oozie URL
-refresh re-materialize the coordinator rerun actions (requires -rerun)
-rerun <arg> rerun a job (coordinator requires -action or -date; bundle requires -coordinator or -date)
-resume <arg> resume a job
-run run a job
-start <arg> start a job
-submit submit a job
-suspend <arg> suspend a job
-timezone <arg> use time zone with the specified ID (default GMT).
See 'oozie info -timezones' for a list
-value <arg> new endtime/concurrency/pausetime value for changing a
coordinator job; new pausetime value for changing a bundle job
-verbose verbose mode
-update Update coordinator definition and properties
-logfilter job log search parameter. Can be specified as -logfilter opt1=val1;opt2=val1;opt3=val1.
Supported options are recent, start, end, loglevel, text, limit and debug.
-ignore <arg> ignore a coordinator job or action
(requires '-action' to
作业状态:
oozie jobs <OPTIONS> :
-auth <arg> 选择验证类型 [SIMPLE|KERBEROS]
-doas <arg> 指定用户操作
-oozie <arg> Oozie URL
-filter <arg> user=<U>\;name=<N>\;group=<G>\;status=<S>\;...
-jobtype <arg> 作业类型(仅支持Oozie-2.0或之后版本ONLY - coordinator' 或 'wf' (default))
-len <arg> 作业数 (default '100')
-localtime 本地时间,将覆盖-timezone 选项
-offset <arg> jobs offset (default '1')
-timezone <arg> 指定时区
-verbose verbose mode
检查多个工作流作业状态:
$ oozie jobs -oozie http://localhost:11000/oozie -localtime -len 2 -filter status=RUNNING
Job Id Workflow Name Status Run User Group Created Started Ended
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
4-20090527151008-oozie-joe hadoopel-wf RUNNING 0 joe other 2009-05-27 15:34 +0530 2009-05-27 15:34 +0530 -
0-20090527151008-oozie-joe hadoopel-wf RUNNING 0 joe other 2009-05-27 15:15 +0530 2009-05-27 15:15 +0530 -
可用filter有:
name: the workflow application name from the workflow definition.
user: the user that submitted the job.
group: the group for the job.
status: the status of the job.
frequency: the frequency of the Coordinator job.
unit: the time unit. It can take one of the following four values: months, days, hours or minutes. Time unit should be added only when frequency is specified.
检查多个Coordinator 作业状态:
$ oozie jobs -oozie http://localhost:11000/oozie -jobtype coordinator
检查多个bundle作业状态:
$ oozie jobs -oozie http://localhost:11000/oozie -jobtype bundle
bulk监控用于Bundle运行的作业
$ oozie jobs -oozie http://localhost:11000/oozie -bulk bundle=bundle-app-1
$ oozie jobs -oozie http://localhost:11000/oozie -bulk 'bundle=bundle-app-2;actionstatus=SUCCEEDED' -verbose
bundle: Bundle app-name (mandatory)
coordinators: multiple, comma-separated Coordinator app-names. By default, if none specified, all coordinators pertaining to the bundle are included
actionstatus: status of jobs (default job-type is coordinator action aka workflow job) you wish to filter on. Default value is (KILLED,FAILED)
startcreatedtime/endcreatedtime: specify boundaries for the created-time. Only jobs created within this window are included.
startscheduledtime/endscheduledtime: specify boundaries for the nominal-time. Only jobs scheduled to run within this window are included.
$ oozie jobs <OPTIONS> : jobs status
-bulk <arg> key-value pairs to filter bulk jobs response. e.g.
bundle=<B>\;coordinators=<C>\;actionstatus=<S>\;
startcreatedtime=<SC>\;endcreatedtime=<EC>\;
startscheduledtime=<SS>\;endscheduledtime=<ES>\;
coordinators and actionstatus can be multiple comma
separated values. "bundle" and "coordinators" are 'names' of those jobs.
Bundle name is mandatory, other params are optional
管理员操作:
oozie admin <OPTIONS> :
-auth <arg> 选择验证类型 [SIMPLE|KERBEROS]
-doas <arg> 指定用户操作
-oozie <arg> Oozie URL
-queuedump 显示Oozie服务器队列元素
-servers 可用Oozie服务器列表(启用HA时,超过一个)
-status 当前系统状态
-systemmode <arg> 支持Oozie-2.0或之后版本.改变系统模式[NORMAL| NOWEBSERVICE|SAFEMODE]
-version 显示Oozie服务器编译版本
-shareliblist 列出指定工作流活动的可用共享库
-sharelibupdate 更新服务器使用新的共享库版本
检查Oozie系统状态
$ oozie admin -oozie http://localhost:11000/oozie -status
Safemode: OFF
改变Oozie系统状态
$ oozie admin -oozie http://localhost:11000/oozie -systemmode [NORMAL|NOWEBSERVICE|SAFEMODE]
Safemode: ON
显示Oozie系统编译版本
$ oozie admin -oozie http://localhost:11000/oozie -version
Oozie server build version: 2.0.2.1-0.20.1.3092118008--
显示Oozie系统队列:用于管理员调试目的
$ oozie admin -oozie http://localhost:11000/oozie -queuedump[Server Queue Dump]:(coord_action_start,1),(coord_action_start,1),(coord_action_start,1)(coord_action_ready,1)(action.check,2)
显示Oozie服务器列表:(HA)
$ oozie admin -oozie http://hostA:11000/oozie -servers
hostA : http://hostA:11000/oozie
hostB : http://hostB:11000/oozie
hostC : http://hostC:11000/oozie
SLA Operations 获取SLA 事件列表 $ oozie sla -oozie http://localhost:11000/oozie -len 3 获取特定序列SLA事件 $ oozie sla -oozie http://localhost:11000/oozie -offset 2 -len 1 使用过滤获取SLA事件信息 $ oozie sla -filter jobid=0000000-130130150445097-oozie-joe-C@1\;appname=aggregator-sla-app -len 1 -oozie http://localhost:11000/oozie 可用filter有:jobid,appname
验证操作:
验证WF xml文件:
$ oozie validate myApp/workflow.xml
可用共享库列表
$ oozie admin -oozie http://localhost:11000/oozie -shareliblist
$ oozie admin -oozie http://localhost:11000/oozie -sharelib pig*
更新共享库
$ oozie admin -oozie http://localhost:11000/oozie -sharelibupdate
指定主题详细信息:
oozie info <OPTIONS> :
-timezones display a list of available time zones
获取时区列表:
$ oozie info -timezones
Pig作业:
oozie pig <OPTIONS> -X <ARGS> :
'-X'后的参数将被提交给pig, '-X'后'-D' 参数将放入<configuration>中
-auth <arg> 选择验证类型 [SIMPLE|KERBEROS]
-config <arg> 作业配置文件,文件名后缀'.properties'
-D <property=value> 设置或覆盖给定属性的值
-doas <arg> 指定用户操作
-oozie <arg> Oozie URL
-file <arg> Pig script
-P <property=value> 为脚本设置参数
例子:
$ oozie pig -file pigScriptFile -config job.properties -Dfs.default.name=hdfs://localhost:8020 -PINPUT=/user/me/in -POUTPUT=/user/me/out -X -Dmapred.job.queue.name=UserQueue -param_file params
.
job: 14-20090525161321-oozie-joe-W
.
$cat job.properties
fs.default.name=hdfs://localhost:8020
mapreduce.jobtracker.kerberos.principal=ccc
dfs.namenode.kerberos.principal=ddd
oozie.libpath=hdfs://localhost:8020/user/oozie/pig/lib/
Hive Operations
oozie hive <OPTIONS> -X<ARGS> :
'-X'后的参数将被提交给 hive, '-X'后'-D' 参数将放入<configuration>中
-auth <arg> 选择验证类型 [SIMPLE|KERBEROS]
-config <arg> 作业配置文件,文件名后缀'.properties'
-D <property=value> 设置或覆盖给定属性的值
-doas <arg> 指定用户操作
-oozie <arg> Oozie URL
-file <arg> hive script
-P <property=value> set parameters for script
$ oozie hive -file HIVE-SCRIPT -config OOZIE-CONFIG [-Dkey=value] [-Pkey=value] [-X [-Dkey=value opts for Launcher/Job configuration] [Other opts to pass to Hive]]
例子:
$ oozie hive -file hiveScriptFile -config job.properties -Dfs.default.name=hdfs://localhost:8020 -PINPUT=/user/me/in -POUTPUT=/user/me/out -X -Dmapred.job.queue.name=UserQueue -v
job: 14-20090525161321-oozie-joe-W
$cat job.properties
fs.default.name=hdfs://localhost:8020
mapreduce.jobtracker.kerberos.principal=ccc
dfs.namenode.kerberos.principal=ddd
oozie.libpath=hdfs://localhost:8020/user/oozie/hive/lib/
Sqoop操作
oozie sqoop <OPTIONS> -X<ARGS> :
'-X'后'-D' 参数将放入<configuration>中
-auth <arg> 选择验证类型 [SIMPLE|KERBEROS]
-config <arg> 作业配置文件,文件名后缀'.properties'
-D <property=value> 设置或覆盖给定属性的值
-doas <arg> 指定用户操作
-oozie <arg> Oozie URL
-command <arg> sqoop command
例子:
$ oozie sqoop -oozie http://localhost:11000/oozie -Dfs.default.name=hdfs://localhost:8020 -command import --connect jdbc:mysql://localhost:3306/oozie --username oozie --password oozie --table WF_JOBS --target-dir '/user/${wf:user()}/${examplesRoot}/output-data/sqoop' -m 1 -config job.properties -X -Dmapred.job.queue.name=default
.
job: 14-20090525161322-oozie-joe-W
.
自由格式例子:
$ oozie sqoop -oozie http://localhost:11000/oozie -command import --connect jdbc:mysql://localhost:3306/oozie --username oozie --password oozie --query "SELECT a.id FROM WF_JOBS a WHERE \$CONDITIONS" --target-dir '/user/${wf:user()}/${examplesRoot}/output-data/sqoop' -m 1 -config job.properties -X -Dmapred.job.queue.name=default
.
job: 14-20090525161321-oozie-joe-W
.
$cat job.properties
fs.default.name=hdfs://localhost:8020
mapreduce.jobtracker.kerberos.principal=ccc
dfs.namenode.kerberos.principal=ddd
oozie.libpath=hdfs://localhost:8020/user/oozie/sqoop/lib/
MR操作:
oozie mapreduce <OPTIONS> :
-auth <arg> 选择验证类型 [SIMPLE|KERBEROS]
-config <arg> 作业配置文件,文件名后缀'.properties'
-D <property=value> 设置或覆盖给定属性的值
-doas <arg> 指定用户操作
-oozie <arg> Oozie URL
$ oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties
配置文件中必须指定:
mapred.mapper.class ,mapred.reducer.class ,
mapred.input.dir ,mapred.output.dir ,
oozie.libpath, mapred.job.tracker , fs.default.name
Common CLI Options
Authentication
Impersonation, doAs
Oozie URL
-oozie OOZIE_URL
指定运行Oozie的系统,OOZIE_URL
可以在环境变量中设置。如果都不提供,CLI fail
Time zone
-timezone TIME_ZONE_ID
指定时区;oozie info -timezones
获取可用时区列表; -localtime
和-timezone TIME_ZONE_ID
都指定,则-localtime
将覆盖后者; 均未给定,这将查找 OOZIE_TIMEZONE
环境变量; 上述均为给定,或非法时区,将使用GMT
Debug Mode
OOZIE_DEBUG=1
在CLI将输出详细任意执行的详细信息
CLI retry
Job Operations
提交作业
$ oozie job -oozie http://localhost:11000/oozie -config job.properties -submit
job: 14-20090525161321-oozie-joe
-config
指定xml配置文件;并且wf必须指定oozie.wf.application.path
属性;coordinator 指定oozie.coord.application.path
属性;bundle 指定oozie.bundle.application.path
;并且指定路径必须是一个HDFS路径。当前作业处于 PREP
状态
启动作业
$ oozie job -oozie http://localhost:11000/oozie -start 14-20090525161321-oozie-joe
作业将处于 PREP
状态
运行作业
$ oozie job -oozie http://localhost:11000/oozie -config job.properties -run
job: 15-20090525161321-oozie-joe
run
选项用来创建和启动作业,作业参数必须在属性文件中提供,并指定 -config
选项;
-config
指定xml配置文件;并且wf必须指定oozie.wf.application.path
属性;coordinator 指定oozie.coord.application.path
属性;bundle 指定oozie.bundle.application.path
;并且指定路径必须是一个HDFS路径。作业将处于 RUNNING
状态;
暂停作业
$ oozie job -oozie http://localhost:11000/oozie -suspend 14-20090525161321-oozie-joe
作业将处于 SUSPENDED
状态;暂停 coordinator/bundle作业的 RUNNING , RUNNIINGWITHERROR or PREP
状态.
The suspend option suspends a coordinator/bundle job in RUNNING , RUNNIINGWITHERROR or PREP status. When the coordinator job is suspended, running coordinator actions will stay in running and the workflows will be suspended. If the coordinator job is in RUNNING status, it will transit to SUSPENDED status; if it is in RUNNINGWITHERROR status, it will transit to SUSPENDEDWITHERROR ; if it is in PREP status, it will transit to PREPSUSPENDED status.
When the bundle job is suspended, running coordinators will be suspended. If the bundle job is in RUNNING status, it will transit to SUSPENDED status; if it is in RUNNINGWITHERROR status, it will transit to SUSPENDEDWITHERROR ; if it is in PREP status, it will transit to PREPSUSPENDED status.
恢复作业
$ oozie job -oozie http://localhost:11000/oozie -resume 14-20090525161321-oozie-joe
The suspend option suspends a coordinator/bundle job in SUSPENDED , SUSPENDEDWITHERROR or PREPSUSPENDED status. If the coordinator job is in SUSPENDED status, it will transit to RUNNING status; if it is in SUSPENDEDWITHERROR status, it will transit to RUNNINGWITHERROR ; if it is in PREPSUSPENDED status, it will transit to PREP status.
When the coordinator job is resumed it will create all the coordinator actions that should have been created during the time it was suspended, actions will not be lost, they will delayed.
When the bundle job is resumed, suspended coordinators will resume running. If the bundle job is in SUSPENDED status, it will transit to RUNNING status; if it is in SUSPENDEDWITHERROR status, it will transit to RUNNINGWITHERROR ; if it is in PREPSUSPENDED status, it will transit to PREP status.
Kill作业:KILLED
$ oozie job -oozie http://localhost:11000/oozie -kill 14-20090525161321-oozie-joe
Kill Coordinator 活动或多个活动
$oozie job -kill <coord_Job_id> [-action 1, 3-4, 7-40] [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z]
The kill option here for a range of coordinator actions kills a non-terminal (=RUNNING=, WAITING , READY , SUSPENDED ) coordinator action when coordinator job is not in FAILED or KILLED state.
Either -action or -date should be given.
If neither -action nor -date is given, the exception will be thrown. Also if BOTH -action and -date are given, an error will be thrown.
Multiple ranges can be used in -action or -date. See the above example.
If one of the actions in the given list of -action is already in terminal state, the output of this command will only include the other actions.
The dates specified in -date must be UTC.
Single date specified in -date must be able to find an action with matched nominal time to be effective.
After the command is executed the killed coordiator action will have KILLED status.
改变Coordinator作业 endtime/concurrency/pausetime/status ####
Changing endtime/pausetime of a Bundle Job
Changing endtime/pausetime of a Bundle Job
Rerunning a Workflow Job
Rerunning a Coordinator Action or Multiple Actions
Rerunning a Bundle Job
Checking the Information and Status of a Workflow, Coordinator or Bundle Job or a Coordinator Action
Listing all the Workflows for a Coordinator Action
Checking the xml definition of a Workflow, Coordinator or Bundle Job
Checking the server logs of a Workflow, Coordinator or Bundle Job
Checking the server logs for particular actions of a Coordinator Job
Filtering the server logs with logfilter options
Dryrun of Coordinator Job
Dryrun of Workflow Job
Updating coordinator definition and properties
Ignore a Coordinator Job
Ignore a Coordinator Action or Multiple Coordinator Actions
Jobs Operations
Checking the Status of multiple Workflow Jobs
Checking the Status of multiple Coordinator Jobs
Checking the Status of multiple Bundle Jobs
Bulk monitoring for jobs launched via Bundles
Admin Operations
Checking the Status of the Oozie System
Changing the Status of the Oozie System
Displaying the Build Version of the Oozie System
Displaying the queue dump of the Oozie System
Displaying the list of available Oozie Servers
Validate Operations
Validating a Workflow XML
Getting list of available sharelib
Update system sharelib
SLA Operations Getting a list of SLA events Getting the SLA event with particular sequenceID Getting information about SLA events using filter